Skip to content

fix(quantization): emit axis on DequantizeLinear for per-channel dynamic quantization#28228

Open
Rishi-Dave wants to merge 1 commit intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/dynamic-quant-per-channel-dequant-axis
Open

fix(quantization): emit axis on DequantizeLinear for per-channel dynamic quantization#28228
Rishi-Dave wants to merge 1 commit intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/dynamic-quant-per-channel-dequant-axis

Conversation

@Rishi-Dave
Copy link
Copy Markdown
Contributor

Summary

  • Fix quantize_dynamic(per_channel=True) so weights quantized per-channel produce a DequantizeLinear node with the correct axis attribute.
  • Stop dropping the channel axis when quantize_weight_per_channel populates QuantizedValue (was hardcoded to None).
  • Gate the scalar-scale assertion in _dequantize_value on axis is None so per-channel scales (1-D tensors) are accepted.

Motivation

Fixes #19997.

When a model is quantized with quantize_dynamic(..., per_channel=True) and a per-channel weight reaches _dequantize_value (e.g. via _dequantize_outputs when the weight is in the graph outputs), two bugs surface:

  1. quantize_weight_per_channel stores QuantizedValue.axis = None even though it received a real channel_axis, so the per-channel information is lost.
  2. _dequantize_value (a) asserts scale_init.size == 1, which fails for a 1-D per-channel scale, and (b) builds the DequantizeLinear node without an axis attribute, producing an invalid ONNX node when the model is consumed.

PR #22283 (Nov 2024) softened the assertion against None-typed scales but left the underlying axis-propagation bug in place.

Changes

  • onnxruntime/python/tools/quantization/onnx_quantizer.py
    • quantize_weight_per_channel: pass channel_axis (was None) into QuantizedValue.
    • _dequantize_value: only require a scalar scale on the per-tensor path (axis is None); forward axis=quantized_value.axis to onnx.helper.make_node("DequantizeLinear", ...). make_node silently omits the attribute when axis is None, so the per-tensor path is unchanged.
  • onnxruntime/test/python/quantization/test_quant_issues.py
    • New regression test test_dynamic_quantize_per_channel_emits_axis_attribute that builds a minimal MatMul model with the weight routed to a graph output (to force the _dequantize_outputs -> _dequantize_value path), runs quantize_dynamic(per_channel=True), and asserts the emitted DequantizeLinear has the axis attribute and a 1-D multi-element scale initializer.

Test Plan

  • python -m pytest onnxruntime/test/python/quantization/test_quant_issues.py -xvs — new test passes; existing test skipped as before.
  • python -m pytest onnxruntime/test/python/quantization/test_op_matmul.py — 7 passed, 8 skipped (no regression).
  • python -m pytest onnxruntime/test/python/quantization/test_qdq.py -k per_channel — 1 passed.
  • lintrunner -a on changed files: clean.

`quantize_weight_per_channel` was storing `None` as the axis in the
`QuantizedValue` map entry instead of the actual `channel_axis` argument.
As a result, `_dequantize_value` would hit an AssertionError (scale not
scalar) when the per-channel-quantized weight was also a graph output, and
even on success it would emit a `DequantizeLinear` node with no `axis`
attribute, producing semantically incorrect per-tensor dequantization.

Fix:
- Pass `channel_axis` (not `None`) when constructing `QuantizedValue` in
  `quantize_weight_per_channel`.
- Gate the scalar-scale assertion in `_dequantize_value` on
  `quantized_value.axis is None` (only required for per-tensor).
- Forward `axis=quantized_value.axis` to `onnx.helper.make_node` for
  `DequantizeLinear`; `make_node` ignores `axis=None` automatically, so
  the per-tensor path is unaffected.

Add regression test `test_dynamic_quantize_per_channel_emits_axis_attribute`
that builds a minimal MatMul model with the weight also exposed as a graph
output (so `_dequantize_outputs` fires on the per-channel weight), confirms
quantization completes without error, and asserts the `axis` attribute is
present on the resulting `DequantizeLinear` node with a multi-element scale.

Fixes microsoft#19997
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes per-channel dynamic quantization so that per-channel weight quantization correctly propagates the channel axis into emitted DequantizeLinear nodes (and relaxes the scalar-scale assertion accordingly), addressing a failure mode reported in #19997.

Changes:

  • Preserve channel_axis when creating QuantizedValue for per-channel quantized weights.
  • Update _dequantize_value to (1) only enforce scalar scale for per-tensor quantization (axis is None) and (2) emit DequantizeLinear(axis=...) for per-channel cases.
  • Add a regression test ensuring quantize_dynamic(per_channel=True) emits a DequantizeLinear with an axis attribute and a 1-D per-channel scale initializer.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
onnxruntime/python/tools/quantization/onnx_quantizer.py Propagates per-channel axis into QuantizedValue and forwards it to DequantizeLinear; gates scalar-scale assertion to per-tensor path.
onnxruntime/test/python/quantization/test_quant_issues.py Adds regression coverage that validates axis emission and per-channel (multi-element) scale initializer shape.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +91 to +95
# Build a model: input (5, 4) @ weight (4, 8) -> output (5, 8).
# The weight is also passed through Identity and exposed as a second graph
# output so that _dequantize_outputs calls _dequantize_value on the
# per-channel-quantized weight initializer.
# Weight axis=1 is the output-feature axis (per-channel quantization target).
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test docstring/comments suggest the _dequantize_outputs -> _dequantize_value path is exercised because the per-channel weight is a graph output, but this model outputs weight_out (Identity output), not the initializer weight. In practice the DequantizeLinear insertion here is likely triggered when the quantizer processes the unsupported Identity and dequantizes its (now-quantized) weight input. Updating the comment/docstring to match the actual mechanism would make the regression intent clearer and avoid confusion for future maintainers.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] quantize_dynamic results in AttributeError: 'NoneType' object has no attribute 'HasField'

2 participants